Research Topic

Our research topic is Trend of Major Types of Crimes commited by White Males in the DC Area in 2016-2021. We chose this topic because we are interested in the impact of COVID-19 on crimes. We will use the data provided by the Metropolitan Police of DC regarding adult arrests over a time period stretching between 2016-2021.

Data Shaping

Read Data

We read the data .CSV files of adults arrest in DC area from 2016-2021

df_2016<-data.frame(read.csv("Arrests 2016 Public.csv"))
df_2017<-data.frame(read.csv("Arrests 2017 Public.csv"))
df_2018<-data.frame(read.csv("Arrests by Year, 2018.csv"))
df_2019<-data.frame(read.csv("Arrests by Year, 2019.csv"))
df_2020<-data.frame(read.csv("Arrests by Year 2020.csv"))
df_2021<-data.frame(read.csv("2021 Adult Arrests.csv"))

c16 <- c(colnames(df_2016))
c18 <- c(colnames(df_2018))

The column names of the data in 2016 and 2017 were not the same with others. The below table shows the column names of the data in 2016 and the data in 2016.

col # 2016 2018
1 Arrestee.Type Arrestee.Type
2 Arrest.Year Arrest.Year
3 Arrest.Date Arrest.Date
4 Arrest.Hour Arrest.Hour
5 CCN CCN
6 Arrest.Number. Arrest.Number.
7 Age Age
8 Defendant.PSA Defendant.PSA
9 Defendant.District Defendant.District
10 Defendant.Race Defendant.Race
11 Defendant.Ethnicity Defendant.Ethnicity
12 Defendant.Sex Defendant.Sex
13 Arrest.Category Arrest.Category
14 Charge.Description Charge.Description
15 Arrest.Location.PSA Arrest.Location.PSA
16 Arrest.Location.District Arrest.Location.District
17 Arrest.Location.Block.GeoX Arrest.Block.GEOX
18 Arrest.Location.Block.GeoY Arrest.Block.GEOY
19 Offense.GEOY Arrest.Latitude
20 Offense.GEOX Arrest.Longitude
21 Offense.PSA Offense.Location.PSA
22 Offense.District Offense.Location.District
23 Arrest.Latitude Offense.Block.GEOX
24 Arrest.Longitude Offense.Block.GEOY
25 Offense.Latitude Offense.Latitude
26 Offense.Longitude Offense.Longitude

Merge Multiple CSV Files and Drop Some Columns

The column names were same from the first column to the 14th column in both data. On the other hand, the name and order of 15th and latter columns were a bit different in those data. The latter columns were about locations, and we were not very interested in the detail location. Therefore, we deleted the latter columns except for the 16th and 22nd columns. In addition, we dropped CNN (col #5) and Arrest.Number. (col #6) because they were IDs and useless for our analysis.

The format of date was different from years; the data in 2016 and 2017 has the format like 2016-01-01, the data in 2018 to 2020 has the format like 1/1/2018, and the data in 2021 has the format like 2021/1/1. We coverted Since different date formats for different years are difficult to analyze, we will unify the date format to “yyyy-mm-dd”.

After deleting some columns and changing the date format, we binded data frames by rows.

# convert format
df_2018$Arrest.Date <- as.Date(df_2018$Arrest.Date, format = "%m/%d/%Y") %>% format()
df_2019$Arrest.Date <- as.Date(df_2019$Arrest.Date, format = "%m/%d/%Y") %>% format()
df_2020$Arrest.Date <- as.Date(df_2020$Arrest.Date, format = "%m/%d/%Y") %>% format()
df_2021$Arrest.Date <- as.Date(df_2021$Arrest.Date, format = "%Y/%m/%d") %>% format()

#bind df_2016 and df_2017, and delete some columns
df_16_17 <- rbind(df_2016, df_2017)[,-c(5,6,15,17:21,23:26)]
names(df_16_17)[c(13,14)] <- c('Arrest.Location.District','Offense.Location.District')   #rename columns

#bind df_2018 - df_2021, and delete some columns
df_18_21 <- rbind(df_2018, df_2019, df_2020, df_2021)[,-c(5,6,15,17:21,23:26)] 

DF<-rbind(df_16_17,df_18_21)

Correct Anomalies

Remove abnormal values

To see whether there were abnormal values, we created the table showing some statistics for numerical variables.

xkablesummary(subset(DF,select=c(Arrest.Year, Arrest.Hour, Age)))
Table: Statistics summary.
Arrest.Year Arrest.Hour Age
Min Min. :2016 Min. : 0.00 Min. : 18.00
Q1 1st Qu.:2017 1st Qu.: 6.00 1st Qu.: 25.00
Median Median :2018 Median :12.00 Median : 32.00
Mean Mean :2018 Mean :11.81 Mean : 35.19
Q3 3rd Qu.:2019 3rd Qu.:18.00 3rd Qu.: 43.00
Max Max. :2021 Max. :23.00 Max. :121.00

The maximum age was too old. 55 rows were assigned an age of over 100 years (117-121 ) in these data, and it seemed to be wrong. Therefore, we dropped these rows.

DF <- DF[!DF$Age>=100,]

Correct inconsistent values

Arrest.Category had some different values for 2021 and other years:

  • Data in 2021 had “Release Violations/Fugitive (Fug)” and “Release Violations/Fugitive (Warr)” although data in other years have “Release Violations/Fugitive” instead of them.
  • Data in 2021 had “Fraud and Financial Crimes (Frau)” although data in other years have “Fraud and Financial Crimes”.

Therefore, we coverted these values in 2021 into the correspond values in other years.

DF <- mutate(DF, Arrest.Category = gsub(Arrest.Category, pattern = "Release Violations/Fugitive.*", replacement = "Release Violations/Fugitive"))
DF <- mutate(DF, Arrest.Category = gsub(Arrest.Category, pattern = "Fraud and Financial Crimes.*", replacement = "Fraud and Financial Crimes"))

Remove Unnecessary Rows

Since we were interested in crimes committed by while males, we dropped rows where the value of Defendant.Race was not “White”. The structure of the final data is shown in the below table.

DF_WM <- subset(DF, subset = Defendant.Race=='WHITE' & Defendant.Sex=='MALE')

data.frame(column_name = names(DF_WM),
           class = sapply(DF_WM, typeof),
           first_values = sapply(DF_WM, function(x) paste0(head(x),  collapse = ", ")),
           row.names = NULL) %>% 
  kable("simple", caption = 'Data frame structure')
Data frame structure
column_name class first_values
Arrestee.Type character Adult Arrest, Adult Arrest, Adult Arrest, Adult Arrest, Adult Arrest, Adult Arrest
Arrest.Year integer 2016, 2016, 2016, 2016, 2016, 2016
Arrest.Date character 2016-01-01, 2016-01-01, 2016-01-01, 2016-01-01, 2016-01-01, 2016-01-01
Arrest.Hour integer 0, 0, 1, 1, 13, 2
Age integer 39, 27, 27, 26, 48, 25
Defendant.PSA character Out of State, Out of State, Out of State, Out of State, 404, Out of State
Defendant.District character Out of State, Out of State, Out of State, Out of State, 4D, Out of State
Defendant.Race character WHITE, WHITE, WHITE, WHITE, WHITE, WHITE
Defendant.Ethnicity character UNKNOWN, NOT HISPANIC, HISPANIC, NOT HISPANIC, NOT HISPANIC, HISPANIC
Defendant.Sex character MALE, MALE, MALE, MALE, MALE, MALE
Arrest.Category character Simple Assault, Simple Assault, Driving/Boating While Intoxicated, Simple Assault, Simple Assault, Simple Assault
Charge.Description character Threats To Do Bodily Harm -misd, Simple Assault, Driving While Intoxicated -2nd Off, Simple Assault, Simple Assault, Simple Assault
Arrest.Location.District character 2D, 3D, 4D, 5D, 1D, 3D
Offense.Location.District character 2D, 3D, 4D, 5D, 1D, 3D

EDA

The number of crimes is as follows. Crime occurrences have decreased after COVID-19.

year the number of crimes
2016 2620
2017 2636
2018 2297
2019 2191
2020 1425
2021 1109
2016 - 2019 (before COVID-19) 2436
2020 - 2021 (after COVID-19) 1267

Number of Each Crime

We created some bar plots to see the number of occurrences per type of crime.
The Bar plot of crimes in 2016 - 2021 is as follows:

ggplot(DF_WM, aes(forcats::fct_infreq(Arrest.Category))) +
  ggtitle("Figure 1: Bar plot of crimes in 2016 - 2021") + xlab("crime types") + geom_bar() +
  theme(axis.text.x = element_text(angle = 60, hjust = 1))

The Bar plots of crimes before COVID-19 (2016 - 2019) and after COVID-19 (2020 - 2021) are as follows:

ggplot(subset(DF_WM,Arrest.Year <= 2019), aes(forcats::fct_infreq(Arrest.Category))) +
  ggtitle("Figure 2: Bar plot of crimes befor COVID-19") + xlab("crime types") + geom_bar() +
  theme(axis.text.x = element_text(angle = 60, hjust = 1))

ggplot(subset(DF_WM,Arrest.Year > 2019), aes(forcats::fct_infreq(Arrest.Category))) +
  ggtitle("Figure 3: Bar plot of crimes after COVID-19") + xlab("crime types") + geom_bar() +
  theme(axis.text.x = element_text(angle = 60, hjust = 1))

The Bar plots of crimes in each year are as follows:

ggplot(subset(DF_WM,Arrest.Year == 2016), aes(forcats::fct_infreq(Arrest.Category))) +
  ggtitle("Figure 4: Bar plot of crimes in 2016") + xlab("crime types") + geom_bar() +
  theme(axis.text.x = element_text(angle = 60, hjust = 1))

ggplot(subset(DF_WM,Arrest.Year == 2017), aes(forcats::fct_infreq(Arrest.Category))) +
  ggtitle("Figure 5: Bar plot of crimes in 2017") + xlab("crime types") + geom_bar() +
  theme(axis.text.x = element_text(angle = 60, hjust = 1))

ggplot(subset(DF_WM,Arrest.Year == 2018), aes(forcats::fct_infreq(Arrest.Category))) +
  ggtitle("Figure 6: Bar plot of crimes in 2018") + xlab("crime types") + geom_bar() +
  theme(axis.text.x = element_text(angle = 60, hjust = 1))

ggplot(subset(DF_WM,Arrest.Year == 2019), aes(forcats::fct_infreq(Arrest.Category))) +
  ggtitle("Figure 7: Bar plot of crimes in 2019") + xlab("crime types") + geom_bar() +
  theme(axis.text.x = element_text(angle = 60, hjust = 1))

ggplot(subset(DF_WM,Arrest.Year == 2020), aes(forcats::fct_infreq(Arrest.Category))) +
  ggtitle("Figure 8: Bar plot of crimes in 2020") + xlab("crime types") + geom_bar() +
  theme(axis.text.x = element_text(angle = 60, hjust = 1))

ggplot(subset(DF_WM,Arrest.Year == 2021), aes(forcats::fct_infreq(Arrest.Category))) +
  ggtitle("Figure 9: Bar plot of crimes in 2021") + xlab("crime types") + geom_bar() +
  theme(axis.text.x = element_text(angle = 60, hjust = 1))

“Offenses Against Family & Children” have been increasing after COVD-19.

cnt_2016 <- table(subset(DF_WM,Arrest.Year==2016)$Arrest.Category)
pos_2016 <- order(cnt_2016, decreasing = TRUE)[1:6]
crime_2016 <- names(cnt_2016)[pos_2016]

cnt_2017 <- table(subset(DF_WM,Arrest.Year==2017)$Arrest.Category)
pos_2017 <- order(cnt_2017, decreasing = TRUE)[1:7]
crime_2017 <- names(cnt_2017)[pos_2017]

cnt_2018 <- table(subset(DF_WM,Arrest.Year==2018)$Arrest.Category)
pos_2018 <- order(cnt_2018, decreasing = TRUE)[1:6]
crime_2018 <- names(cnt_2018)[pos_2018]

cnt_2019 <- table(subset(DF_WM,Arrest.Year==2019)$Arrest.Category)
pos_2019 <- order(cnt_2019, decreasing = TRUE)[1:7]
crime_2019 <- names(cnt_2019)[pos_2019]

cnt_2020 <- table(subset(DF_WM,Arrest.Year==2020)$Arrest.Category)
pos_2020 <- order(cnt_2020, decreasing = TRUE)[1:7]
crime_2020 <- names(cnt_2020)[pos_2020]

cnt_2021 <- table(subset(DF_WM,Arrest.Year==2021)$Arrest.Category)
pos_2021 <- order(cnt_2021, decreasing = TRUE)[1:7]
crime_2021 <- names(cnt_2021)[pos_2021]

The top 6 crimes (or 7 crimes when ‘Other Crimes’ are included) in each year are as follows.

Rank 2016 2017 2018 2019 2020 2021
1 Simple Assault Simple Assault Simple Assault Simple Assault Simple Assault Simple Assault
2 Traffic Violations Traffic Violations Traffic Violations Traffic Violations Driving/Boating While Intoxicated Traffic Violations
3 Release Violations/Fugitive Release Violations/Fugitive Release Violations/Fugitive Prostitution Release Violations/Fugitive Driving/Boating While Intoxicated
4 Driving/Boating While Intoxicated Driving/Boating While Intoxicated Driving/Boating While Intoxicated Driving/Boating While Intoxicated Traffic Violations Release Violations/Fugitive
5 Liquor Law Violations Other Crimes Narcotics Release Violations/Fugitive Offenses Against Family & Children Other Crimes
6 Narcotics Disorderly Conduct Theft Other Crimes Other Crimes Offenses Against Family & Children
7 NA Liquor Law Violations NA Theft Narcotics Damage to Property

Time Series Change in the Number of Major Crimes

To see the trend of the above major crimes, we created a line plot as follows.

SA_cnt <- sapply(unique(DF_WM[DF_WM$Arrest.Category=='Simple Assault',]$Arrest.Year),
                       function(x){sum(DF_WM[DF_WM$Arrest.Category=='Simple Assault',]$Arrest.Year==x)})
TV_cnt <- sapply(unique(DF_WM[DF_WM$Arrest.Category=='Traffic Violations',]$Arrest.Year),
                       function(x){sum(DF_WM[DF_WM$Arrest.Category=='Traffic Violations',]$Arrest.Year==x)})
RV_cnt <- sapply(unique(DF_WM[DF_WM$Arrest.Category=='Release Violations/Fugitive',]$Arrest.Year),
                       function(x){sum(DF_WM[DF_WM$Arrest.Category=='Release Violations/Fugitive',]$Arrest.Year==x)})
DI_cnt <- sapply(unique(DF_WM[DF_WM$Arrest.Category=='Driving/Boating While Intoxicated',]$Arrest.Year),
                       function(x){sum(DF_WM[DF_WM$Arrest.Category=='Driving/Boating While Intoxicated',]$Arrest.Year==x)})
N_cnt <- sapply(unique(DF_WM[DF_WM$Arrest.Category=='Narcotics',]$Arrest.Year),
                       function(x){sum(DF_WM[DF_WM$Arrest.Category=='Narcotics',]$Arrest.Year==x)})
LV_cnt <- sapply(unique(DF_WM[DF_WM$Arrest.Category=='Liquor Law Violations',]$Arrest.Year),
                       function(x){sum(DF_WM[DF_WM$Arrest.Category=='Liquor Law Violations',]$Arrest.Year==x)})
T_cnt <- sapply(unique(DF_WM[DF_WM$Arrest.Category=='Theft',]$Arrest.Year),
                       function(x){sum(DF_WM[DF_WM$Arrest.Category=='Theft',]$Arrest.Year==x)})
DV_cnt <- sapply(unique(DF_WM[DF_WM$Arrest.Category=='Offenses Against Family & Children',]$Arrest.Year),
                       function(x){sum(DF_WM[DF_WM$Arrest.Category=='Offenses Against Family & Children',]$Arrest.Year==x)})
DC_cnt <- sapply(unique(DF_WM[DF_WM$Arrest.Category=='Disorderly Conduct',]$Arrest.Year),
                       function(x){sum(DF_WM[DF_WM$Arrest.Category=='Disorderly Conduct',]$Arrest.Year==x)})
P_cnt <- sapply(unique(DF_WM[DF_WM$Arrest.Category=='Prostitution',]$Arrest.Year),
                       function(x){sum(DF_WM[DF_WM$Arrest.Category=='Prostitution',]$Arrest.Year==x)})
DP_cnt <- sapply(unique(DF_WM[DF_WM$Arrest.Category=='Damage to Property',]$Arrest.Year),
                       function(x){sum(DF_WM[DF_WM$Arrest.Category=='Damage to Property',]$Arrest.Year==x)})
year_lst <- 2016:2021

major_crimes_df <- data.frame(year_lst, SA_cnt, TV_cnt, RV_cnt, DI_cnt, N_cnt, LV_cnt, T_cnt, DV_cnt, DC_cnt, P_cnt, DP_cnt)

colnames(major_crimes_df) <- c('Year', 'Simple Assault', 'Traffic Violations', 'Release Violations/Fugitive', 'Driving/Boating While Intoxicated',
                               'Narcotics', 'Liquor Law Violations', 'Theft', 'Offenses Against Family & Children', 'Disorderly Conduct', 'Prostitution', 'Damage to Property')

major_crimes_df2 <- major_crimes_df %>% gather(key = 'Crimes', value = "Count", -Year)

ggplot(data=major_crimes_df2, aes(x=Year, y=Count, color=Crimes)) +
  geom_line() + geom_point()

“Simple Assault”, “Traffic Violations”, and “Theft” have clearly declined since 2020. On the other, “Offenses Against Family & Children” has increased in 2020 and 2021 compared to previous years. COVID-19 seems to be related to these trend change. We posed the following SMART QUESTION, and we will analyze these four crimes in detail in the following.

Is there a significant difference in “Simple Assault”, “Traffic Violations”, “Theft”, and “Offenses Against Family & Children” trends among adult white males within the DC area between 2016 and 2021, and could COVID protocols play a role in these trend shifts?

Analysis

Since crime is likely to be a rare event, the number of occurrences per day of a given crime is expected to follow Poisson distribution. Poisson distribution is a distribution used to describe the distribution of the number of rare phenomena when a large number of them are observed. If a distribution follows Poisson distribution, and the average number of occurrences of the phenomenon is \(\lambda\), the probability that the phenomenon will occur \(x\) times is given by \[p(x) = \exp(-\lambda)\frac{\lambda^{x}}{x!}.\] In the following, we will estimate \(\lambda\) of each crime before and after COVID-19 to see there is a difference in crime trend.

Offenses Against Family & Children

Before COVID-19

The trend of “Offenses Against Family & Children,” Domestic Violence (DV), appears to have changed after COVID-19. The frequency table of DV before COVID-19 is as follows.

DF_WM_16_19 <- DF_WM[DF_WM$Arrest.Year%in%c(2016,2017,2018,2019),]
DF_WM_16_19_DV <- DF_WM_16_19[DF_WM_16_19$Arrest.Category=='Offenses Against Family & Children',]

# table of date and the number of occurrences
DV_day_16_19 <- sapply(unique(DF_WM_16_19_DV$Arrest.Date),
                       function(x){sum(DF_WM_16_19_DV$Arrest.Date==x)})
# of occurrences per day Frequency Relative frequency
0 1364 0.9336071
1 95 0.065024
2 2 0.0013689
3 0 0

We can calculate \(\lambda\) from the above table and \(\lambda = 0.0678\). We will plot the histogram and Poisson distribution with \(\lambda = 0.0678\) to check if they match or not.

x_DV <- 0:5
y_DV <- c(1364,95,2,0,0,0)
fx <- dpois(x=x_DV, lambda=99/(365*4+1))
data_DV <- data.frame(x_DV, y_DV, fx)

ggplot(data_DV, aes(x=x_DV,y=y_DV)) +
  ggtitle("Figure 13: Histogram of DV in 2016 - 2019") + 
  xlab("Number of occurrences per day") + 
  ylab("Frequency") +
  geom_bar(stat = "identity") 

ggplot(data_DV) +
  ggtitle("Figure 14: Relative frequency histogram of DV in 2016 - 2019 \n and Poisson distribution with lambda = 0.0678") + 
  xlab("Number of occurrences per day") + 
  ylab("Relative frequency") +
  geom_bar(aes(x=x_DV,y=y_DV/(365*4+1)), stat = "identity") +
  geom_line(aes(x=x_DV,y=fx), color='red') +
  geom_point(aes(x=x_DV,y=fx), color='red')

We can see that the Poisson distribution fits well with the histogram.

Next, we try to estimate \(99\%\) Confidence Interval of \(\lambda\). The variance of Poisson distribution is equal to its mean (\(\lambda\)). Therefore, \(99\%\) Confidence Interval of \(\lambda\) can be written as \[ \bar{x} - z_{*}\cdot\sqrt{\frac{\bar{x}}{n}} \leq \lambda \leq \bar{x} + z_{*}\cdot\sqrt{\frac{\bar{x}}{n}}, \] where \(\bar{x}\) is the sample mean, \(n\) is the sample size, and \(z_*\) is z-value corresponding to the \(99\%\) confidence interval, and the value is 2.58. From this expression, 99% Confidence Interval of \(\lambda\) for DV before COVID-19 is [0.05, 0.0856].

After COVID-19

The frequency table of DV after COVID-19 is as follows.

DF_WM_20_21 <- DF_WM[DF_WM$Arrest.Year%in%c(2020,2021),]
DF_WM_20_21_DV <- DF_WM_20_21[DF_WM_20_21$Arrest.Category=='Offenses Against Family & Children',]

# table of date and the number of occurrences
DV_day_20_21 <- sapply(unique(DF_WM_20_21_DV$Arrest.Date),
                       function(x){sum(DF_WM_20_21_DV$Arrest.Date==x)})
# of occurrences per day Frequency Relative frequency
0 680 0.9302326
1 47 0.0642955
2 1 0.001368
3 0 0
4 1 0.001368
5 0 0
0 0
44 0 0
45 1 0.001368
46 0 0
0 0
77 0 0
78 1 0.001368
79 0 0
0 0

There are two outliers (45 and 78) in the table. The dates of them are 2021-01-06 and 2020-06-01. Since these dates are correspond to “Capitol attack” and “George Floyd protests”, we will drop the value of these dates.

The calculated \(\lambda = 0.0725\). The histogram and the poisson distribution with \(\lambda = 0.0725\) are shown in Figure 16.

x_DV <- 0:5
y_DV <- c(680,47,1,0,1,0)
fx <- dpois(x=x_DV, lambda=53/(365*2+1))
data_DV <- data.frame(x_DV, y_DV, fx)

ggplot(data_DV, aes(x=x_DV,y=y_DV)) +
  ggtitle("Figure 15: Histogram of DV in 2020 - 2021") + 
  xlab("Number of occurrences per day") + 
  ylab("Frequency") +
  geom_bar(stat = "identity") 

ggplot(data_DV) +
  ggtitle("Figure 16: Reralive frequency histogram of DV in 2020 - 2021 \n and Pission distribution with lambda = 0.0725") + 
  xlab("Number of occurrences per day") + 
  ylab("Relative frequency") +
  geom_bar(aes(x=x_DV,y=y_DV/(365*2+1)), stat = "identity") +
  geom_line(aes(x=x_DV,y=fx), color='red') +
  geom_point(aes(x=x_DV,y=fx), color='red')

The Poisson distribution fits well with the histogram.

99% Confidence Interval of \(\lambda\) for DV after COVID-19 is [0.0465, 0.0985].

Comparing Confidence Intervals

Figure 17 shows the Confidence Intervals before and after COVID-19. There is overlap in the Confidence Intervals, and it is not possible to say that there was a change in the \(\lambda\) of “Offenses Against Family & Children” before or after COVID-19.

x <- c('Before COVID-19','After COVID-19')
pre_covid_interval <- c(99/(365*4+1)-2.58*(99/(365*4+1)/(356*4+1))**0.5, 99/(365*4+1)+ 2.58*(99/(365*4+1)/(356*4+1))**0.5)
post_covid_interval <- c(53/(365*2+1) - 2.58*(53/(365*2+1)/(356*2+1))**0.5, 53/(365*2+1) + 2.58*(53/(365*2+1)/(356*2+1))**0.5)
data_CI_DV <- data.frame(x,pre_covid_interval,post_covid_interval)

ggplot(data_CI_DV) +
  ggtitle("Figure 17: 99% Confidence Interval of lambda for DV") + 
  xlab("") +
  ylab("99% Confidence Interval of lambda") + 
  coord_flip() +
  geom_linerange(aes(x=x[1], ymin=pre_covid_interval[1], ymax=pre_covid_interval[2])) +
  geom_linerange(aes(x=x[2], ymin=post_covid_interval[1], ymax=post_covid_interval[2]))

Traffic Violations

Before COVID-19

The trend of “Traffic Violations” also appears to have changed after COVID-19. The frequency table of Traffic Violations before COVID-19 is as follows.

DF_WM_16_19_TV <- DF_WM_16_19[DF_WM_16_19$Arrest.Category=='Traffic Violations',]

# table of date and the number of occurrences
TV_day_16_19 <- sapply(unique(DF_WM_16_19_TV$Arrest.Date),
                       function(x){sum(DF_WM_16_19_TV$Arrest.Date==x)})
# of occurrences per day Frequency Relative frequency
0 602 0.4123288
1 530 0.3627652
2 225 0.1540041
3 77 0.0527036
4 22 0.0150582
5 4 0.0027379
6 1 6.844627^{-4}
7 0 0

The calculated \(\lambda = 0.907\). The histogram and the poisson distribution with \(\lambda = 0.907\) are shown in Figure 19.

x_TV <- 0:10
y_TV <- c(602,530,225,77,22,4,1,0,0,0,0)
fx <- dpois(x=x_TV, lambda=sum(TV_day_16_19)/(365*4+1))
data_TV <- data.frame(x_TV, y_TV, fx)

ggplot(data_TV, aes(x=x_TV,y=y_TV)) +
  ggtitle("Figure 18: Histogram of traffic violations in 2016 - 2019") + 
  xlab("Number of occurrences per day") + 
  ylab("Frequency") +
  geom_bar(stat = "identity") 

ggplot(data_TV) +
  ggtitle("Figure 19: Relative frequency histogram of traffic violations in 2016 - 2019 \n and Pission distribution with lambda = 0.907") + 
  xlab("Number of occurrences per day") + 
  ylab("Relative frequency") +
  geom_bar(aes(x=x_TV,y=y_TV/(365*4+1)), stat = "identity") +
  geom_line(aes(x=x_TV,y=fx), color="red") +
  geom_point(aes(x=x_TV,y=fx), color='red')

The Poisson distribution fits well with the histogram.

99% Confidence Interval of \(\lambda\) for Traffic Violations before COVID-19 is [0.842, 0.972].

After COVID-19

The frequency table of Traffic Violations after COVID-19 is as follows.

DF_WM_20_21_TV <- DF_WM_20_21[DF_WM_20_21$Arrest.Category=='Traffic Violations',]

# table of date and the number of occurrences
TV_day_20_21 <- sapply(unique(DF_WM_20_21_TV$Arrest.Date),
                       function(x){sum(DF_WM_20_21_TV$Arrest.Date==x)})
# of occurrences per day Frequency Relative frequency
0 546 0.746922
1 156 0.2134063
2 23 0.0314637
3 3 0.004104
4 3 0.004104
5 0 0

The calculated \(\lambda = 0.306\). The histogram and the poisson distribution with \(\lambda = 0.306\) are shown in Figure 21.

x_TV <- 0:10
y_TV <- c(546,156,23,3,3,0,0,0,0,0,0)
fx <- dpois(x=x_TV, lambda=sum(TV_day_20_21)/(365*2+1))
data_TV <- data.frame(x_TV, y_TV, fx)

ggplot(data_TV, aes(x=x_TV,y=y_TV)) +
  ggtitle("Figure 20: Histogram of traffic violations in 2020 - 2021") + 
  xlab("Number of occurrences per day") + 
  ylab("Frequency") +
  geom_bar(stat = "identity") 

ggplot(data_TV) +
  ggtitle("Figure 21: Relative frequency histogram of traffic violations in 2020 - 2021 \n and Pission distribution with lambda = 0.306") + 
  xlab("Number of occurrences per day") + 
  ylab("Relative frequency") +
  geom_bar(aes(x=x_TV,y=y_TV/(365*2+1)), stat = "identity") +
  geom_line(aes(x=x_TV,y=fx), color="red") +
  geom_point(aes(x=x_TV,y=fx), color='red')

The Poisson distribution fits well with the histogram.

99% Confidence Interval of \(\lambda\) for Traffic Violations before COVID-19 is [0.252, 0.358].

Comparing Confidence Intervals

Figure 22 shows the Confidence Intervals before and after COVID-19. There is no overlap in the Confidence Intervals, and there may have been a change in the Traffic Violations’ lambda before and after COVID-19.

x <- c('Before COVID-19','After COVID-19')
pre_covid_interval <- c(sum(TV_day_16_19)/(365*4+1) - 2.58*(sum(TV_day_16_19)/(365*4+1)/(356*4+1))**0.5,
                        sum(TV_day_16_19)/(365*4+1) + 2.58*(sum(TV_day_16_19)/(365*4+1)/(356*4+1))**0.5)
post_covid_interval <- c(sum(TV_day_20_21)/(365*2+1) - 2.58*(sum(TV_day_20_21)/(365*2+1)/(356*2+1))**0.5,
                         sum(TV_day_20_21)/(365*2+1) + 2.58*(sum(TV_day_20_21)/(365*2+1)/(356*2+1))**0.5)
data_CI_TV <- data.frame(x,pre_covid_interval,post_covid_interval)

ggplot(data_CI_TV) +
  ggtitle("Figure 22: 99% Confidence Interval of lambda for Traffic Violations") + 
  ylab("99% Confidence Interval of lambda") + 
  xlab("") +
  coord_flip() +
  geom_linerange(aes(x=x[1], ymin=pre_covid_interval[1], ymax=pre_covid_interval[2])) +
  geom_linerange(aes(x=x[2], ymin=post_covid_interval[1], ymax=post_covid_interval[2]))

Simple Assault

Before COVID-19

DF_WM_16_19_SA <- DF_WM_16_19[DF_WM_16_19$Arrest.Category=='Simple Assault',]

# table of date and the number of occurrences
SA_day_16_19 <- sapply(unique(DF_WM_16_19_SA$Arrest.Date),
                       function(x){sum(DF_WM_16_19_SA$Arrest.Date==x)})
# of occurrences per day Frequency Relative frequency
0 438 0.2997947
1 479 0.3278576
2 284 0.1943874
3 156 0.1067762
4 76 0.0520192
5 13 0.008898
6 8 0.0054757
7 4 0.0027379
8 1 6.844627^{-4}
9 2 0.0013689
10 0 0
x_SA <- 0:10
y_SA <- c(438,479,284,156,76,13,8,4,1,2,0)
fx <- dpois(x=x_SA, lambda=sum(SA_day_16_19)/(365*4+1))
data_SA <- data.frame(x_SA, y_SA, fx)

ggplot(data_SA, aes(x=x_SA,y=y_SA)) +
  ggtitle("Figure 23: Histogram of Simple Assault in 2016 - 2019") + 
  xlab("Number of occurrences per day") + 
  ylab("Frequency") +
  geom_bar(stat = "identity") 

ggplot(data_SA) +
  ggtitle("Figure 24: Relative frequency histogram of Simple Assault in 2016 - 2019 \n and Poisson distribution with lambda = 1.36") + 
  xlab("Number of occurrences per day") + 
  ylab("Relative frequency") +
  geom_bar(aes(x=x_SA,y=y_SA/(365*4+1)), stat = "identity") +
  geom_line(aes(x=x_SA,y=fx), color='red') +
  geom_point(aes(x=x_SA,y=fx), color='red')

After COVID-19

DF_WM_20_21_SA <- DF_WM_20_21[DF_WM_20_21$Arrest.Category=='Simple Assault',]

# table of date and the number of occurrences
SA_day_20_21 <- sapply(unique(DF_WM_20_21_SA$Arrest.Date),
                       function(x){sum(DF_WM_20_21_SA$Arrest.Date==x)})
# of occurrences per day Frequency Relative frequency
0 297 0.4062927
1 267 0.3652531
2 112 0.1532148
3 43 0.0588235
4 8 0.0109439
5 3 0.004104
6 0 0
7 0 0
8 1 0.001368
9 0 0
x_SA <- 0:10
y_SA <- c(297,267,112,43,8,3,0,0,1,0,0)
fx <- dpois(x=x_SA, lambda=sum(SA_day_20_21)/(365*2+1))
data_SA <- data.frame(x_SA, y_SA, fx)

ggplot(data_SA, aes(x=x_SA,y=y_SA)) +
  ggtitle("Figure 25: Histogram of Simple Assault in 2020 - 2021") + 
  xlab("Number of occurrences per day") + 
  ylab("Frequency") +
  geom_bar(stat = "identity") 

ggplot(data_SA) +
  ggtitle("Figure 25: Relative frequency histogram of Simple Assault in 2020 - 2021 \n and Poisson distribution with lambda = 0.923") + 
  xlab("Number of occurrences per day") + 
  ylab("Relative frequency") +
  geom_bar(aes(x=x_SA,y=y_SA/(365*2+1)), stat = "identity") +
  geom_line(aes(x=x_SA,y=fx), color='red') +
  geom_point(aes(x=x_SA,y=fx), color='red')

Comparing Confidence Intervals

x <- c('Before COVID-19','After COVID-19')
pre_covid_interval <- c(sum(SA_day_16_19)/(365*4+1) - 2.58*(sum(SA_day_16_19)/(365*4+1)/(356*4+1))**0.5,
                        sum(SA_day_16_19)/(365*4+1) + 2.58*(sum(SA_day_16_19)/(365*4+1)/(356*4+1))**0.5)
post_covid_interval <- c(sum(SA_day_20_21)/(365*2+1) - 2.58*(sum(SA_day_20_21)/(365*2+1)/(356*2+1))**0.5,
                         sum(SA_day_20_21)/(365*2+1) + 2.58*(sum(SA_day_20_21)/(365*2+1)/(356*2+1))**0.5)
data_CI_SA <- data.frame(x,pre_covid_interval,post_covid_interval)

ggplot(data_CI_SA) +
  ggtitle("Figure 26: 99% Confidence Interval of lambda for Simple Assault") + 
  ylab("99% Confidence Interval of lambda") + 
  xlab("") +
  coord_flip() +
  geom_linerange(aes(x=x[1], ymin=pre_covid_interval[1], ymax=pre_covid_interval[2])) +
  geom_linerange(aes(x=x[2], ymin=post_covid_interval[1], ymax=post_covid_interval[2]))

Theft

Before COVID-19

DF_WM_16_19_T <- DF_WM_16_19[DF_WM_16_19$Arrest.Category=='Theft',]

# table of date and the number of occurrences
T_day_16_19 <- sapply(unique(DF_WM_16_19_T$Arrest.Date),
                       function(x){sum(DF_WM_16_19_T$Arrest.Date==x)})
# of occurrences per day Frequency Relative frequency
0 973 0.6659822
1 398 0.2724162
2 79 0.0540726
3 10 0.0068446
4 1 6.844627^{-4}
5 0 0
x_T <- 0:5
y_T <- c(973,398,79,10,1,0)
fx <- dpois(x=x_T, lambda=sum(T_day_16_19)/(365*4+1))
data_T <- data.frame(x_T, y_T, fx)

ggplot(data_T, aes(x=x_T,y=y_T)) +
  ggtitle("Figure : Histogram of Theft in 2016 - 2019") + 
  xlab("Number of occurrences per day") + 
  ylab("Frequency") +
  geom_bar(stat = "identity") 

ggplot(data_T) +
  ggtitle("Figure : Relative frequency histogram of Theft in 2016 - 2019 \n and Poisson distribution with lambda = 0.404") + 
  xlab("Number of occurrences per day") + 
  ylab("Relative frequency") +
  geom_bar(aes(x=x_T,y=y_T/(365*4+1)), stat = "identity") +
  geom_line(aes(x=x_T,y=fx), color='red') +
  geom_point(aes(x=x_T,y=fx), color='red')

After COVID-19

DF_WM_20_21_T <- DF_WM_20_21[DF_WM_20_21$Arrest.Category=='Theft',]

# table of date and the number of occurrences
T_day_20_21 <- sapply(unique(DF_WM_20_21_T$Arrest.Date),
                       function(x){sum(DF_WM_20_21_T$Arrest.Date==x)})
# of occurrences per day Frequency Relative frequency
0 653 0.8932969
1 72 0.0984952
2 6 0.0082079
3 0 0
x_T <- 0:3
y_T <- c(653,72,6,0)
fx <- dpois(x=x_T, lambda=sum(T_day_20_21)/(365*2+1))
data_T <- data.frame(x_T, y_T, fx)

ggplot(data_T, aes(x=x_T,y=y_T)) +
  ggtitle("Figure : Histogram of Theft in 2020 - 2021") + 
  xlab("Number of occurrences per day") + 
  ylab("Frequency") +
  geom_bar(stat = "identity") 

ggplot(data_T) +
  ggtitle("Figure : Relative frequency histogram of Theft in 2020 - 2021 \n and Poisson distribution with lambda = 0.115") + 
  xlab("Number of occurrences per day") + 
  ylab("Relative frequency") +
  geom_bar(aes(x=x_T,y=y_T/(365*2+1)), stat = "identity") +
  geom_line(aes(x=x_T,y=fx), color='red') +
  geom_point(aes(x=x_T,y=fx), color='red')

Comparing Confidence Intervals

x <- c('Before COVID-19','After COVID-19')
pre_covid_interval <- c(sum(T_day_16_19)/(365*4+1) - 2.58*(sum(T_day_16_19)/(365*4+1)/(356*4+1))**0.5,
                        sum(T_day_16_19)/(365*4+1) + 2.58*(sum(T_day_16_19)/(365*4+1)/(356*4+1))**0.5)
post_covid_interval <- c(sum(T_day_20_21)/(365*2+1) - 2.58*(sum(T_day_20_21)/(365*2+1)/(356*2+1))**0.5,
                         sum(T_day_20_21)/(365*2+1) + 2.58*(sum(T_day_20_21)/(365*2+1)/(356*2+1))**0.5)
data_CI_T <- data.frame(x,pre_covid_interval,post_covid_interval)

ggplot(data_CI_T) +
  ggtitle("Figure : 99% Confidence Interval of lambda for Theft") + 
  ylab("99% Confidence Interval of lambda") + 
  xlab("") +
  coord_flip() +
  geom_linerange(aes(x=x[1], ymin=pre_covid_interval[1], ymax=pre_covid_interval[2])) +
  geom_linerange(aes(x=x[2], ymin=post_covid_interval[1], ymax=post_covid_interval[2]))

check

# table of date and the number of occurrences
SA_day_16 <- sapply(unique(DF_WM_16_19_SA[DF_WM_16_19_SA$Arrest.Year==2016,]$Arrest.Date),
                    function(x){sum(DF_WM_16_19_SA[DF_WM_16_19_SA$Arrest.Year==2016,]$Arrest.Date==x)})
SA_day_17 <- sapply(unique(DF_WM_16_19_SA[DF_WM_16_19_SA$Arrest.Year==2017,]$Arrest.Date),
                    function(x){sum(DF_WM_16_19_SA[DF_WM_16_19_SA$Arrest.Year==2017,]$Arrest.Date==x)})
SA_day_18 <- sapply(unique(DF_WM_16_19_SA[DF_WM_16_19_SA$Arrest.Year==2018,]$Arrest.Date),
                    function(x){sum(DF_WM_16_19_SA[DF_WM_16_19_SA$Arrest.Year==2018,]$Arrest.Date==x)})
SA_day_19 <- sapply(unique(DF_WM_16_19_SA[DF_WM_16_19_SA$Arrest.Year==2019,]$Arrest.Date),
                    function(x){sum(DF_WM_16_19_SA[DF_WM_16_19_SA$Arrest.Year==2019,]$Arrest.Date==x)})

# 2016
x_SA <- 0:10
y_SA <- c(365+1-length(SA_day_16),length(SA_day_16[SA_day_16==1]),length(SA_day_16[SA_day_16==2]),
          length(SA_day_16[SA_day_16==3]),length(SA_day_16[SA_day_16==4]),
          length(SA_day_16[SA_day_16==5]),length(SA_day_16[SA_day_16==6]),
          length(SA_day_16[SA_day_16==7]),length(SA_day_16[SA_day_16==8]),
          length(SA_day_16[SA_day_16==9]),length(SA_day_16[SA_day_16==10]))
fx <- dpois(x=x_SA, lambda=sum(SA_day_16)/(365+1))
data_SA <- data.frame(x_SA, y_SA, fx)

ggplot(data_SA) +
  ggtitle("Figure: Relative frequency histogram of Simple Assault in 2016 
          \n and Poisson distribution with lambda = 1.5") + 
  xlab("Number of occurrences per day") + 
  ylab("Relative frequency") +
  geom_bar(aes(x=x_SA,y=y_SA/(365+1)), stat = "identity") +
  geom_line(aes(x=x_SA,y=fx), color='red') +
  geom_point(aes(x=x_SA,y=fx), color='red')

# 2017
x_SA <- 0:10
y_SA <- c(365-length(SA_day_17),length(SA_day_17[SA_day_17==1]),length(SA_day_17[SA_day_17==2]),
          length(SA_day_17[SA_day_17==3]),length(SA_day_17[SA_day_17==4]),
          length(SA_day_17[SA_day_17==5]),length(SA_day_17[SA_day_17==6]),
          length(SA_day_17[SA_day_17==7]),length(SA_day_17[SA_day_17==8]),
          length(SA_day_17[SA_day_17==9]),length(SA_day_17[SA_day_17==10]))
fx <- dpois(x=x_SA, lambda=sum(SA_day_17)/365)
data_SA <- data.frame(x_SA, y_SA, fx)

ggplot(data_SA) +
  ggtitle("Figure: Relative frequency histogram of Simple Assault in 2017 
          \n and Poisson distribution with lambda = 1.33") + 
  xlab("Number of occurrences per day") + 
  ylab("Relative frequency") +
  geom_bar(aes(x=x_SA,y=y_SA/365), stat = "identity") +
  geom_line(aes(x=x_SA,y=fx), color='red') +
  geom_point(aes(x=x_SA,y=fx), color='red')

# 2018
x_SA <- 0:10
y_SA <- c(365-length(SA_day_18),length(SA_day_18[SA_day_18==1]),length(SA_day_18[SA_day_18==2]),
          length(SA_day_18[SA_day_18==3]),length(SA_day_18[SA_day_18==4]),
          length(SA_day_18[SA_day_18==5]),length(SA_day_18[SA_day_18==6]),
          length(SA_day_18[SA_day_18==7]),length(SA_day_18[SA_day_18==8]),
          length(SA_day_18[SA_day_18==9]),length(SA_day_18[SA_day_18==10]))
fx <- dpois(x=x_SA, lambda=sum(SA_day_18)/365)
data_SA <- data.frame(x_SA, y_SA, fx)

ggplot(data_SA) +
  ggtitle("Figure: Relative frequency histogram of Simple Assault in 2018 
          \n and Poisson distribution with lambda = 1.32") + 
  xlab("Number of occurrences per day") + 
  ylab("Relative frequency") +
  geom_bar(aes(x=x_SA,y=y_SA/365), stat = "identity") +
  geom_line(aes(x=x_SA,y=fx), color='red') +
  geom_point(aes(x=x_SA,y=fx), color='red')

# 2019
x_SA <- 0:10
y_SA <- c(365-length(SA_day_19),length(SA_day_19[SA_day_19==1]),length(SA_day_19[SA_day_19==2]),
          length(SA_day_19[SA_day_19==3]),length(SA_day_19[SA_day_19==4]),
          length(SA_day_19[SA_day_19==5]),length(SA_day_19[SA_day_19==6]),
          length(SA_day_19[SA_day_19==7]),length(SA_day_19[SA_day_19==8]),
          length(SA_day_19[SA_day_19==9]),length(SA_day_19[SA_day_19==10]))
fx <- dpois(x=x_SA, lambda=sum(SA_day_19)/365)
data_SA <- data.frame(x_SA, y_SA, fx)

ggplot(data_SA) +
  ggtitle("Figure: Relative frequency histogram of Simple Assault in 2019 
          \n and Poisson distribution with lambda = 1.3") + 
  xlab("Number of occurrences per day") + 
  ylab("Relative frequency") +
  geom_bar(aes(x=x_SA,y=y_SA/365), stat = "identity") +
  geom_line(aes(x=x_SA,y=fx), color='red') +
  geom_point(aes(x=x_SA,y=fx), color='red')

# table of date and the number of occurrences
SA_day_20 <- sapply(unique(DF_WM_20_21_SA[DF_WM_20_21_SA$Arrest.Year==2020,]$Arrest.Date),
                    function(x){sum(DF_WM_20_21_SA[DF_WM_20_21_SA$Arrest.Year==2020,]$Arrest.Date==x)})
SA_day_21 <- sapply(unique(DF_WM_20_21_SA[DF_WM_20_21_SA$Arrest.Year==2021,]$Arrest.Date),
                    function(x){sum(DF_WM_20_21_SA[DF_WM_20_21_SA$Arrest.Year==2021,]$Arrest.Date==x)})
# 2020
x_SA <- 0:10
y_SA <- c(365+1-length(SA_day_20),length(SA_day_20[SA_day_20==1]),length(SA_day_20[SA_day_20==2]),
          length(SA_day_20[SA_day_20==3]),length(SA_day_20[SA_day_20==4]),
          length(SA_day_20[SA_day_20==5]),length(SA_day_20[SA_day_20==6]),
          length(SA_day_20[SA_day_20==7]),length(SA_day_20[SA_day_20==8]),
          length(SA_day_20[SA_day_20==9]),length(SA_day_20[SA_day_20==10]))
fx <- dpois(x=x_SA, lambda=sum(SA_day_20)/(365+1))
data_SA <- data.frame(x_SA, y_SA, fx)

ggplot(data_SA) +
  ggtitle("Figure: Relative frequency histogram of Simple Assault in 2020 
          \n and Poisson distribution with lambda = 0.954") + 
  xlab("Number of occurrences per day") + 
  ylab("Relative frequency") +
  geom_bar(aes(x=x_SA,y=y_SA/(365+1)), stat = "identity") +
  geom_line(aes(x=x_SA,y=fx), color='red') +
  geom_point(aes(x=x_SA,y=fx), color='red')

# 2021
x_SA <- 0:10
y_SA <- c(365-length(SA_day_21),length(SA_day_21[SA_day_21==1]),length(SA_day_21[SA_day_21==2]),
          length(SA_day_21[SA_day_21==3]),length(SA_day_21[SA_day_21==4]),
          length(SA_day_21[SA_day_21==5]),length(SA_day_21[SA_day_21==6]),
          length(SA_day_21[SA_day_21==7]),length(SA_day_21[SA_day_21==8]),
          length(SA_day_21[SA_day_21==9]),length(SA_day_21[SA_day_21==10]))
fx <- dpois(x=x_SA, lambda=sum(SA_day_21)/365)
data_SA <- data.frame(x_SA, y_SA, fx)

ggplot(data_SA) +
  ggtitle("Figure: Relative frequency histogram of Simple Assault in 2021 
          \n and Poisson distribution with lambda = 0.893") + 
  xlab("Number of occurrences per day") + 
  ylab("Relative frequency") +
  geom_bar(aes(x=x_SA,y=y_SA/365), stat = "identity") +
  geom_line(aes(x=x_SA,y=fx), color='red') +
  geom_point(aes(x=x_SA,y=fx), color='red')

x <- c('2016','2017','2018','2019','2020','2021')
interval_2016 <- c(sum(SA_day_16)/(365+1) - 2.58*(sum(SA_day_16)/(365+1)/(356+1))**0.5,
                   sum(SA_day_16)/(365+1) + 2.58*(sum(SA_day_16)/(365+1)/(356+1))**0.5)
interval_2017 <- c(sum(SA_day_17)/(365) - 2.58*(sum(SA_day_17)/(365)/(356))**0.5,
                   sum(SA_day_17)/(365) + 2.58*(sum(SA_day_17)/(365)/(356))**0.5)
interval_2018 <- c(sum(SA_day_18)/(365) - 2.58*(sum(SA_day_18)/(365)/(356))**0.5,
                   sum(SA_day_18)/(365) + 2.58*(sum(SA_day_18)/(365)/(356))**0.5)
interval_2019 <- c(sum(SA_day_19)/(365) - 2.58*(sum(SA_day_19)/(365)/(356))**0.5,
                   sum(SA_day_19)/(365) + 2.58*(sum(SA_day_19)/(365)/(356))**0.5)
interval_2020 <- c(sum(SA_day_20)/(365+1) - 2.58*(sum(SA_day_20)/(365+1)/(356+1))**0.5,
                   sum(SA_day_20)/(365+1) + 2.58*(sum(SA_day_20)/(365+1)/(356+1))**0.5)
interval_2021 <- c(sum(SA_day_21)/(365) - 2.58*(sum(SA_day_21)/(365)/(356))**0.5,
                   sum(SA_day_21)/(365) + 2.58*(sum(SA_day_21)/(365)/(356))**0.5)
data_CI_SA <- data.frame(x,interval_2016,interval_2017,interval_2018,interval_2019,interval_2020,interval_2021)

ggplot(data_CI_SA) +
  ggtitle("Figure: 99% Confidence Interval of lambda for Simple Assault") + 
  ylab("99% Confidence Interval of lambda") + 
  xlab("") +
  coord_flip() +
  geom_linerange(aes(x=x[1], ymin=interval_2016[1], ymax=interval_2016[2])) +
  geom_linerange(aes(x=x[2], ymin=interval_2017[1], ymax=interval_2017[2])) +
  geom_linerange(aes(x=x[3], ymin=interval_2018[1], ymax=interval_2018[2])) +
  geom_linerange(aes(x=x[4], ymin=interval_2019[1], ymax=interval_2019[2])) +
  geom_linerange(aes(x=x[5], ymin=interval_2020[1], ymax=interval_2020[2])) +
  geom_linerange(aes(x=x[6], ymin=interval_2021[1], ymax=interval_2021[2]))

mean16 = sum(SA_day_16)/(365+1)
mean17 = sum(SA_day_17)/(365)
mean18 = sum(SA_day_18)/(365)
mean19 = sum(SA_day_19)/(365)
mean20 = sum(SA_day_20)/(365+1)
mean21 = sum(SA_day_21)/(365)

var16 = (((0-mean16)**2)*(366 - length(SA_day_16)) + ((1-mean16)**2)*length(SA_day_16[SA_day_16==1]) +
   ((2-mean16)**2)*length(SA_day_16[SA_day_16==2]) + ((3-mean16)**2)*length(SA_day_16[SA_day_16==3]) +
   ((4-mean16)**2)*length(SA_day_16[SA_day_16==4]) + ((5-mean16)**2)*length(SA_day_16[SA_day_16==5]) +
   ((6-mean16)**2)*length(SA_day_16[SA_day_16==6]) + ((7-mean16)**2)*length(SA_day_16[SA_day_16==7]) +
   ((8-mean16)**2)*length(SA_day_16[SA_day_16==8]) + ((9-mean16)**2)*length(SA_day_16[SA_day_16==9]))/365

var17 = (((0-mean17)**2)*(365 - length(SA_day_17)) + ((1-mean17)**2)*length(SA_day_17[SA_day_17==1]) +
   ((2-mean17)**2)*length(SA_day_17[SA_day_17==2]) + ((3-mean17)**2)*length(SA_day_17[SA_day_17==3]) +
   ((4-mean17)**2)*length(SA_day_17[SA_day_17==4]) + ((5-mean17)**2)*length(SA_day_17[SA_day_17==5]) +
   ((6-mean17)**2)*length(SA_day_17[SA_day_17==6]) + ((7-mean17)**2)*length(SA_day_17[SA_day_17==7]) +
   ((8-mean17)**2)*length(SA_day_17[SA_day_17==8]) + ((9-mean17)**2)*length(SA_day_17[SA_day_17==9]))/364

var18 = (((0-mean18)**2)*(366 - length(SA_day_18)) + ((1-mean18)**2)*length(SA_day_18[SA_day_18==1]) +
   ((2-mean18)**2)*length(SA_day_18[SA_day_18==2]) + ((3-mean18)**2)*length(SA_day_18[SA_day_18==3]) +
   ((4-mean18)**2)*length(SA_day_18[SA_day_18==4]) + ((5-mean18)**2)*length(SA_day_18[SA_day_18==5]) +
   ((6-mean18)**2)*length(SA_day_18[SA_day_18==6]) + ((7-mean18)**2)*length(SA_day_18[SA_day_18==7]) +
   ((8-mean18)**2)*length(SA_day_18[SA_day_18==8]) + ((9-mean18)**2)*length(SA_day_18[SA_day_18==9]))/364

var19 = (((0-mean19)**2)*(366 - length(SA_day_19)) + ((1-mean19)**2)*length(SA_day_19[SA_day_19==1]) +
   ((2-mean19)**2)*length(SA_day_19[SA_day_19==2]) + ((3-mean19)**2)*length(SA_day_19[SA_day_19==3]) +
   ((4-mean19)**2)*length(SA_day_19[SA_day_19==4]) + ((5-mean19)**2)*length(SA_day_19[SA_day_19==5]) +
   ((6-mean19)**2)*length(SA_day_19[SA_day_19==6]) + ((7-mean19)**2)*length(SA_day_19[SA_day_19==7]) +
   ((8-mean19)**2)*length(SA_day_19[SA_day_19==8]) + ((9-mean19)**2)*length(SA_day_19[SA_day_19==9]))/364

var20 = (((0-mean20)**2)*(366 - length(SA_day_20)) + ((1-mean20)**2)*length(SA_day_20[SA_day_20==1]) +
   ((2-mean20)**2)*length(SA_day_16[SA_day_20==2]) + ((3-mean20)**2)*length(SA_day_20[SA_day_20==3]) +
   ((4-mean20)**2)*length(SA_day_16[SA_day_20==4]) + ((5-mean20)**2)*length(SA_day_20[SA_day_20==5]) +
   ((6-mean20)**2)*length(SA_day_16[SA_day_20==6]) + ((7-mean20)**2)*length(SA_day_20[SA_day_20==7]) +
   ((8-mean20)**2)*length(SA_day_16[SA_day_20==8]) + ((9-mean20)**2)*length(SA_day_20[SA_day_20==9]))/365

var21 = (((0-mean21)**2)*(366 - length(SA_day_21)) + ((1-mean21)**2)*length(SA_day_21[SA_day_21==1]) +
   ((2-mean21)**2)*length(SA_day_21[SA_day_21==2]) + ((3-mean21)**2)*length(SA_day_21[SA_day_21==3]) +
   ((4-mean21)**2)*length(SA_day_21[SA_day_21==4]) + ((5-mean21)**2)*length(SA_day_21[SA_day_21==5]) +
   ((6-mean21)**2)*length(SA_day_21[SA_day_21==6]) + ((7-mean21)**2)*length(SA_day_21[SA_day_21==7]) +
   ((8-mean21)**2)*length(SA_day_21[SA_day_21==8]) + ((9-mean21)**2)*length(SA_day_21[SA_day_21==9]))/364

print('The mean and variance of SA in 2016 are ')
mean16
var16
print('The mean and variance of SA in 2017 are ')
mean17
var17
print('The mean and variance of SA in 2018 are ')
mean18
var18
print('The mean and variance of SA in 2019 are ')
mean19
var19
print('The mean and variance of SA in 2020 are ')
mean20
var20
print('The mean and variance of SA in 2021 are ')
mean21
var21

Other crimes

Release Violations/Fugitive

DF_WM_16_19_RV <- DF_WM_16_19[DF_WM_16_19$Arrest.Category=='Release Violations/Fugitive',]

# table of date and the number of occurrences
RV_day_16_19 <- sapply(unique(DF_WM_16_19_RV$Arrest.Date),
                       function(x){sum(DF_WM_16_19_RV$Arrest.Date==x)})
# of occurrences per day Frequency Relative frequency
0 817 0.559206
1 428 0.29295
2 168 0.1149897
3 40 0.0273785
4 5 0.0034223
5 3 0.0020534
6 0 0
x_RV <- 0:6
y_RV <- c(817,428,168,40,5,3,0)
fx <- dpois(x=x_RV, lambda=sum(RV_day_16_19)/(365*4+1))
data_RV <- data.frame(x_RV, y_RV, fx)

ggplot(data_RV, aes(x=x_RV,y=y_RV)) +
  ggtitle("Figure : Histogram of Release Violations/Fugitive in 2016 - 2019") + 
  xlab("Number of occurrences per day") + 
  ylab("Frequency") +
  geom_bar(stat = "identity") 

ggplot(data_RV) +
  ggtitle("Figure : Relative frequency histogram of Release Violations/Fugitive in 2016 - 2019 \n and Poisson distribution with lambda = 0.629") + 
  xlab("Number of occurrences per day") + 
  ylab("Relative frequency") +
  geom_bar(aes(x=x_RV,y=y_RV/(365*4+1)), stat = "identity") +
  geom_line(aes(x=x_RV,y=fx), color='red') +
  geom_point(aes(x=x_RV,y=fx), color='red')

DF_WM_20_21_RV <- DF_WM_20_21[DF_WM_20_21$Arrest.Category=='Release Violations/Fugitive',]

# table of date and the number of occurrences
RV_day_20_21 <- sapply(unique(DF_WM_20_21_RV$Arrest.Date),
                       function(x){sum(DF_WM_20_21_RV$Arrest.Date==x)})
# of occurrences per day Frequency Relative frequency
0 549 0.751026
1 154 0.2106703
2 24 0.0328317
3 4 0.005472
4 0 0
x_RV <- 0:4
y_RV <- c(549,154,24,4,0)
fx <- dpois(x=x_RV, lambda=sum(RV_day_20_21)/(365*2+1))
data_RV <- data.frame(x_RV, y_RV, fx)

ggplot(data_RV, aes(x=x_RV,y=y_RV)) +
  ggtitle("Figure : Histogram of Release Violations/Fugitive in 2020 - 2021") + 
  xlab("Number of occurrences per day") + 
  ylab("Frequency") +
  geom_bar(stat = "identity") 

ggplot(data_RV) +
  ggtitle("Figure : Relative frequency histogram of Release Violations/Fugitive in 2020 - 2021 \n and Poisson distribution with lambda = 0.293") + 
  xlab("Number of occurrences per day") + 
  ylab("Relative frequency") +
  geom_bar(aes(x=x_RV,y=y_RV/(365*2+1)), stat = "identity") +
  geom_line(aes(x=x_RV,y=fx), color='red') +
  geom_point(aes(x=x_RV,y=fx), color='red')

x <- c('Before COVID-19','After COVID-19')
pre_covid_interval <- c(sum(RV_day_16_19)/(365*4+1) - 2.58*(sum(RV_day_16_19)/(365*4+1)/(356*4+1))**0.5,
                        sum(RV_day_16_19)/(365*4+1) + 2.58*(sum(RV_day_16_19)/(365*4+1)/(356*4+1))**0.5)
post_covid_interval <- c(sum(RV_day_20_21)/(365*2+1) - 2.58*(sum(RV_day_20_21)/(365*2+1)/(356*2+1))**0.5,
                         sum(RV_day_20_21)/(365*2+1) + 2.58*(sum(RV_day_20_21)/(365*2+1)/(356*2+1))**0.5)
data_CI_RV <- data.frame(x,pre_covid_interval,post_covid_interval)

ggplot(data_CI_RV) +
  ggtitle("Figure : 99% Confidence Interval of lambda for Release Violations/Fugitive") + 
  ylab("99% Confidence Interval of lambda") + 
  xlab("") +
  coord_flip() +
  geom_linerange(aes(x=x[1], ymin=pre_covid_interval[1], ymax=pre_covid_interval[2])) +
  geom_linerange(aes(x=x[2], ymin=post_covid_interval[1], ymax=post_covid_interval[2]))

Driving/Boating While Intoxicated

DF_WM_16_19_DI <- DF_WM_16_19[DF_WM_16_19$Arrest.Category=='Driving/Boating While Intoxicated',]

# table of date and the number of occurrences
DI_day_16_19 <- sapply(unique(DF_WM_16_19_DI$Arrest.Date),
                       function(x){sum(DF_WM_16_19_DI$Arrest.Date==x)})
# of occurrences per day Frequency Relative frequency
0 875 0.5989049
1 416 0.2847365
2 138 0.0944559
3 25 0.0171116
4 6 0.0041068
5 1 6.844627^{-4}
6 0 0
x_DI <- 0:6
y_DI <- c(875,416,138,25,6,1,0)
fx <- dpois(x=x_DI, lambda=sum(DI_day_16_19)/(365*4+1))
data_DI <- data.frame(x_DI, y_DI, fx)

ggplot(data_DI, aes(x=x_DI,y=y_DI)) +
  ggtitle("Figure : Histogram of Driving/Boating While Intoxicated in 2016 - 2019") + 
  xlab("Number of occurrences per day") + 
  ylab("Frequency") +
  geom_bar(stat = "identity") 

ggplot(data_DI) +
  ggtitle("Figure : Relative frequency histogram of Driving/Boating While Intoxicated in 2016 - 2019 \n and Poisson distribution with lambda = 0.545") + 
  xlab("Number of occurrences per day") + 
  ylab("Relative frequency") +
  geom_bar(aes(x=x_DI,y=y_DI/(365*4+1)), stat = "identity") +
  geom_line(aes(x=x_DI,y=fx), color='red') +
  geom_point(aes(x=x_DI,y=fx), color='red')

DF_WM_20_21_DI <- DF_WM_20_21[DF_WM_20_21$Arrest.Category=='Driving/Boating While Intoxicated',]

# table of date and the number of occurrences
DI_day_20_21 <- sapply(unique(DF_WM_20_21_DI$Arrest.Date),
                       function(x){sum(DF_WM_20_21_DI$Arrest.Date==x)})
# of occurrences per day Frequency Relative frequency
0 527 0.7209302
1 168 0.2298222
2 29 0.0396717
3 5 0.0068399
4 2 0.002736
5 0 0
x_DI <- 0:5
y_DI <- c(527,168,29,5,2,0)
fx <- dpois(x=x_DI, lambda=sum(DI_day_20_21)/(365*2+1))
data_DI <- data.frame(x_DI, y_DI, fx)

ggplot(data_DI, aes(x=x_DI,y=y_DI)) +
  ggtitle("Figure : Histogram of Driving/Boating While Intoxicated in 2020 - 2021") + 
  xlab("Number of occurrences per day") + 
  ylab("Frequency") +
  geom_bar(stat = "identity") 

ggplot(data_DI) +
  ggtitle("Figure : Relative frequency histogram of Driving/Boating While Intoxicated in 2020 - 2021 \n and Poisson distribution with lambda = 0.341") + 
  xlab("Number of occurrences per day") + 
  ylab("Relative frequency") +
  geom_bar(aes(x=x_DI,y=y_DI/(365*2+1)), stat = "identity") +
  geom_line(aes(x=x_DI,y=fx), color='red') +
  geom_point(aes(x=x_DI,y=fx), color='red')

x <- c('Before COVID-19','After COVID-19')
pre_covid_interval <- c(sum(DI_day_16_19)/(365*4+1) - 2.58*(sum(DI_day_16_19)/(365*4+1)/(356*4+1))**0.5,
                        sum(DI_day_16_19)/(365*4+1) + 2.58*(sum(DI_day_16_19)/(365*4+1)/(356*4+1))**0.5)
post_covid_interval <- c(sum(DI_day_20_21)/(365*2+1) - 2.58*(sum(DI_day_20_21)/(365*2+1)/(356*2+1))**0.5,
                         sum(DI_day_20_21)/(365*2+1) + 2.58*(sum(DI_day_20_21)/(365*2+1)/(356*2+1))**0.5)
data_CI_DI <- data.frame(x,pre_covid_interval,post_covid_interval)

ggplot(data_CI_DI) +
  ggtitle("Figure : 99% Confidence Interval of lambda for Driving/Boating While Intoxicated") + 
  ylab("99% Confidence Interval of lambda") + 
  xlab("") +
  coord_flip() +
  geom_linerange(aes(x=x[1], ymin=pre_covid_interval[1], ymax=pre_covid_interval[2])) +
  geom_linerange(aes(x=x[2], ymin=post_covid_interval[1], ymax=post_covid_interval[2]))

Narcotics

DF_WM_16_19_N <- DF_WM_16_19[DF_WM_16_19$Arrest.Category=='Narcotics',]

# table of date and the number of occurrences
N_day_16_19 <- sapply(unique(DF_WM_16_19_N$Arrest.Date),
                       function(x){sum(DF_WM_16_19_N$Arrest.Date==x)})
# of occurrences per day Frequency Relative frequency
0 1053 0.7207392
1 318 0.2176591
2 66 0.0451745
3 10 0.0068446
4 5 0.0034223
5 6 0.0041068
6 0 0
7 1 6.844627^{-4}
8 0 0
9 1 6.844627^{-4}
10 0 0
11 0 0
12 0 0
13 1 6.844627^{-4}
14 0 0
x_N <- 0:14
y_N <- c(1053,318,66,10,5,6,0,1,0,1,0,0,0,1,0)
fx <- dpois(x=x_N, lambda=sum(N_day_16_19)/(365*4+1))
data_N <- data.frame(x_N, y_N, fx)

ggplot(data_N, aes(x=x_N,y=y_N)) +
  ggtitle("Figure : Histogram of Narcotics in 2016 - 2019") + 
  xlab("Number of occurrences per day") + 
  ylab("Frequency") +
  geom_bar(stat = "identity") 

ggplot(data_N) +
  ggtitle("Figure : Relative frequency histogram of Narcotics in 2016 - 2019 \n and Poisson distribution with lambda = 0.383") + 
  xlab("Number of occurrences per day") + 
  ylab("Relative frequency") +
  geom_bar(aes(x=x_N,y=y_N/(365*4+1)), stat = "identity") +
  geom_line(aes(x=x_N,y=fx), color='red') +
  geom_point(aes(x=x_N,y=fx), color='red')

DF_WM_20_21_N <- DF_WM_20_21[DF_WM_20_21$Arrest.Category=='Narcotics',]

# table of date and the number of occurrences
N_day_20_21 <- sapply(unique(DF_WM_20_21_N$Arrest.Date),
                       function(x){sum(DF_WM_20_21_N$Arrest.Date==x)})
# of occurrences per day Frequency Relative frequency
0 652 0.8919289
1 66 0.0902873
2 10 0.0136799
3 3 0.004104
4 0 0
x_N <- 0:4
y_N <- c(652,66,10,3,0)
fx <- dpois(x=x_N, lambda=sum(N_day_20_21)/(365*2+1))
data_N <- data.frame(x_N, y_N, fx)

ggplot(data_N, aes(x=x_N,y=y_N)) +
  ggtitle("Figure : Histogram of Narcotics in 2020 - 2021") + 
  xlab("Number of occurrences per day") + 
  ylab("Frequency") +
  geom_bar(stat = "identity") 

ggplot(data_N) +
  ggtitle("Figure : Relative frequency histogram of Narcotics in 2020 - 2021 \n and Poisson distribution with lambda = 0.13") + 
  xlab("Number of occurrences per day") + 
  ylab("Relative frequency") +
  geom_bar(aes(x=x_N,y=y_N/(365*2+1)), stat = "identity") +
  geom_line(aes(x=x_N,y=fx), color='red') +
  geom_point(aes(x=x_N,y=fx), color='red')

x <- c('Before COVID-19','After COVID-19')
pre_covid_interval <- c(sum(N_day_16_19)/(365*4+1) - 2.58*(sum(N_day_16_19)/(365*4+1)/(356*4+1))**0.5,
                        sum(N_day_16_19)/(365*4+1) + 2.58*(sum(N_day_16_19)/(365*4+1)/(356*4+1))**0.5)
post_covid_interval <- c(sum(N_day_20_21)/(365*2+1) - 2.58*(sum(N_day_20_21)/(365*2+1)/(356*2+1))**0.5,
                         sum(N_day_20_21)/(365*2+1) + 2.58*(sum(N_day_20_21)/(365*2+1)/(356*2+1))**0.5)
data_CI_N <- data.frame(x,pre_covid_interval,post_covid_interval)

ggplot(data_CI_N) +
  ggtitle("Figure : 99% Confidence Interval of lambda for Narcotics") + 
  ylab("99% Confidence Interval of lambda") + 
  xlab("") +
  coord_flip() +
  geom_linerange(aes(x=x[1], ymin=pre_covid_interval[1], ymax=pre_covid_interval[2])) +
  geom_linerange(aes(x=x[2], ymin=post_covid_interval[1], ymax=post_covid_interval[2]))

Liquor Law Violations

DF_WM_16_19_LV <- DF_WM_16_19[DF_WM_16_19$Arrest.Category=='Liquor Law Violations',]

# table of date and the number of occurrences
LV_day_16_19 <- sapply(unique(DF_WM_16_19_LV$Arrest.Date),
                       function(x){sum(DF_WM_16_19_LV$Arrest.Date==x)})
# of occurrences per day Frequency Relative frequency
0 1090 0.7460643
1 259 0.1772758
2 76 0.0520192
3 32 0.0219028
4 3 0.0020534
5 1 6.844627^{-4}
6 0 0
x_LV <- 0:6
y_LV <- c(1090,259,76,32,3,1,0)
fx <- dpois(x=x_LV, lambda=sum(LV_day_16_19)/(365*4+1))
data_LV <- data.frame(x_LV, y_LV, fx)

ggplot(data_LV, aes(x=x_LV,y=y_LV)) +
  ggtitle("Figure : Histogram of Liquor Law Violations in 2016 - 2019") + 
  xlab("Number of occurrences per day") + 
  ylab("Frequency") +
  geom_bar(stat = "identity") 

ggplot(data_LV) +
  ggtitle("Figure : Relative frequency histogram of Liquor Law Violations in 2016 - 2019 \n and Poisson distribution with lambda = 0.359") + 
  xlab("Number of occurrences per day") + 
  ylab("Relative frequency") +
  geom_bar(aes(x=x_LV,y=y_LV/(365*4+1)), stat = "identity") +
  geom_line(aes(x=x_LV,y=fx), color='red') +
  geom_point(aes(x=x_LV,y=fx), color='red')

DF_WM_20_21_LV <- DF_WM_20_21[DF_WM_20_21$Arrest.Category=='Liquor Law Violations',]

# table of date and the number of occurrences
LV_day_20_21 <- sapply(unique(DF_WM_20_21_LV$Arrest.Date),
                       function(x){sum(DF_WM_20_21_LV$Arrest.Date==x)})
# of occurrences per day Frequency Relative frequency
0 699 0.9562244
1 27 0.0369357
2 4 0.005472
3 1 0.001368
4 0 0
x_LV <- 0:4
y_LV <- c(699,27,4,1,0)
fx <- dpois(x=x_LV, lambda=sum(LV_day_20_21)/(365*2+1))
data_LV <- data.frame(x_LV, y_LV, fx)

ggplot(data_LV, aes(x=x_LV,y=y_LV)) +
  ggtitle("Figure : Histogram of Liquor Law Violations in 2020 - 2021") + 
  xlab("Number of occurrences per day") + 
  ylab("Frequency") +
  geom_bar(stat = "identity") 

ggplot(data_LV) +
  ggtitle("Figure : Relative frequency histogram of Liquor Law Violations in 2020 - 2021 \n and Poisson distribution with lambda = 0.052") + 
  xlab("Number of occurrences per day") + 
  ylab("Relative frequency") +
  geom_bar(aes(x=x_LV,y=y_LV/(365*2+1)), stat = "identity") +
  geom_line(aes(x=x_LV,y=fx), color='red') +
  geom_point(aes(x=x_LV,y=fx), color='red')

x <- c('Before COVID-19','After COVID-19')
pre_covid_interval <- c(sum(LV_day_16_19)/(365*4+1) - 2.58*(sum(LV_day_16_19)/(365*4+1)/(356*4+1))**0.5,
                        sum(LV_day_16_19)/(365*4+1) + 2.58*(sum(LV_day_16_19)/(365*4+1)/(356*4+1))**0.5)
post_covid_interval <- c(sum(LV_day_20_21)/(365*2+1) - 2.58*(sum(LV_day_20_21)/(365*2+1)/(356*2+1))**0.5,
                         sum(LV_day_20_21)/(365*2+1) + 2.58*(sum(LV_day_20_21)/(365*2+1)/(356*2+1))**0.5)
data_CI_LV <- data.frame(x,pre_covid_interval,post_covid_interval)

ggplot(data_CI_LV) +
  ggtitle("Figure : 99% Confidence Interval of lambda for Liquor Law Violations") + 
  ylab("99% Confidence Interval of lambda") + 
  xlab("") +
  coord_flip() +
  geom_linerange(aes(x=x[1], ymin=pre_covid_interval[1], ymax=pre_covid_interval[2])) +
  geom_linerange(aes(x=x[2], ymin=post_covid_interval[1], ymax=post_covid_interval[2]))